195 research outputs found
Individual Fairness in Pipelines
It is well understood that a system built from individually fair components
may not itself be individually fair. In this work, we investigate individual
fairness under pipeline composition. Pipelines differ from ordinary sequential
or repeated composition in that individuals may drop out at any stage, and
classification in subsequent stages may depend on the remaining "cohort" of
individuals. As an example, a company might hire a team for a new project and
at a later point promote the highest performer on the team. Unlike other
repeated classification settings, where the degree of unfairness degrades
gracefully over multiple fair steps, the degree of unfairness in pipelines can
be arbitrary, even in a pipeline with just two stages.
Guided by a panoply of real-world examples, we provide a rigorous framework
for evaluating different types of fairness guarantees for pipelines. We show
that na\"{i}ve auditing is unable to uncover systematic unfairness and that, in
order to ensure fairness, some form of dependence must exist between the design
of algorithms at different stages in the pipeline. Finally, we provide
constructions that permit flexibility at later stages, meaning that there is no
need to lock in the entire pipeline at the time that the early stage is
constructed
Efficient Algorithms for Privately Releasing Marginals via Convex Relaxations
Consider a database of people, each represented by a bit-string of length
corresponding to the setting of binary attributes. A -way marginal
query is specified by a subset of attributes, and a -dimensional
binary vector specifying their values. The result for this query is a
count of the number of people in the database whose attribute vector restricted
to agrees with .
Privately releasing approximate answers to a set of -way marginal queries
is one of the most important and well-motivated problems in differential
privacy. Information theoretically, the error complexity of marginal queries is
well-understood: the per-query additive error is known to be at least
and at most
. However, no polynomial
time algorithm with error complexity as low as the information theoretic upper
bound is known for small . In this work we present a polynomial time
algorithm that, for any distribution on marginal queries, achieves average
error at most . This error
bound is as good as the best known information theoretic upper bounds for
. This bound is an improvement over previous work on efficiently releasing
marginals when is small and when error is desirable. Using private
boosting we are also able to give nearly matching worst-case error bounds.
Our algorithms are based on the geometric techniques of Nikolov, Talwar, and
Zhang. The main new ingredients are convex relaxations and careful use of the
Frank-Wolfe algorithm for constrained convex minimization. To design our
relaxations, we rely on the Grothendieck inequality from functional analysis
Improved Generalization Guarantees in Restricted Data Models
Differential privacy is known to protect against threats to validity incurred
due to adaptive, or exploratory, data analysis -- even when the analyst
adversarially searches for a statistical estimate that diverges from the true
value of the quantity of interest on the underlying population. The cost of
this protection is the accuracy loss incurred by differential privacy. In this
work, inspired by standard models in the genomics literature, we consider data
models in which individuals are represented by a sequence of attributes with
the property that where distant attributes are only weakly correlated. We show
that, under this assumption, it is possible to "re-use" privacy budget on
different portions of the data, significantly improving accuracy without
increasing the risk of overfitting.Comment: 13 pages, published in FORC 202
Abstracting Fairness: Oracles, Metrics, and Interpretability
It is well understood that classification algorithms, for example, for
deciding on loan applications, cannot be evaluated for fairness without taking
context into account. We examine what can be learned from a fairness oracle
equipped with an underlying understanding of ``true'' fairness. The oracle
takes as input a (context, classifier) pair satisfying an arbitrary fairness
definition, and accepts or rejects the pair according to whether the classifier
satisfies the underlying fairness truth. Our principal conceptual result is an
extraction procedure that learns the underlying truth; moreover, the procedure
can learn an approximation to this truth given access to a weak form of the
oracle. Since every ``truly fair'' classifier induces a coarse metric, in which
those receiving the same decision are at distance zero from one another and
those receiving different decisions are at distance one, this extraction
process provides the basis for ensuring a rough form of metric fairness, also
known as individual fairness. Our principal technical result is a higher
fidelity extractor under a mild technical constraint on the weak oracle's
conception of fairness. Our framework permits the scenario in which many
classifiers, with differing outcomes, may all be considered fair. Our results
have implications for interpretablity -- a highly desired but poorly defined
property of classification systems that endeavors to permit a human arbiter to
reject classifiers deemed to be ``unfair'' or illegitimately derived.Comment: 17 pages, 1 figur
Proving Differential Privacy with Shadow Execution
Recent work on formal verification of differential privacy shows a trend
toward usability and expressiveness -- generating a correctness proof of
sophisticated algorithm while minimizing the annotation burden on programmers.
Sometimes, combining those two requires substantial changes to program logics:
one recent paper is able to verify Report Noisy Max automatically, but it
involves a complex verification system using customized program logics and
verifiers.
In this paper, we propose a new proof technique, called shadow execution, and
embed it into a language called ShadowDP. ShadowDP uses shadow execution to
generate proofs of differential privacy with very few programmer annotations
and without relying on customized logics and verifiers. In addition to
verifying Report Noisy Max, we show that it can verify a new variant of Sparse
Vector that reports the gap between some noisy query answers and the noisy
threshold. Moreover, ShadowDP reduces the complexity of verification: for all
of the algorithms we have evaluated, type checking and verification in total
takes at most 3 seconds, while prior work takes minutes on the same algorithms.Comment: 23 pages, 12 figures, PLDI'1
- …